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ABSTRACT 


This paper provides the experimenter with one method 
of performing several statistical tests, when the data 
distribution is not normal or is unknown. The method is 
applied to simulated landing data for a lunar, excursion 
module. . 
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INTRODUCTION 

An error frequently committed in statistical analysis 
* 

of data obtained for reliability studies is to assume that 
the population from which the data is taken has a normal 
distribution when* in fact, it does not. One effect of 
making such an error is that probabilities and tolerance 
limits obtained by standard statistical techniques are invalid 
hence, if the reliability criterion is very stringent, the 
conclusions reached might lead to disastrous consequences. 

This paper is divided into three sections. The first 
section contains an example of the false conclusions that 
may be obtained when the data is erroneously assumed to be 
from a normal distribution. The second section contains four 
theorems that enable the experimenter to perform a reliability 
study when the distribution is not normal or is unknown. The 
third section illustrates the use of the theorems developed 
in Section II. 
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X, R, v 1J4 z 


x(i) 

E(z) 



jP» “» 8 

i 

All lower-case 
letters 


SYMBOLS 

Random variables unless specified otherwise 
Sample size. 

(100 x p) the percentage point of the 
distribution of X* 

ith Sample value of X. 

ith Ordered Sample value of X. 

Cumulative distribution function of X. 

Ci.e. , F(z) *= Pr (X <_ z}3 

nl 

Binomial coefficient equal to — f ^ n _ r ^ ; 
Probabilities. 

Constants, unless specified otherwise. 


w 

2 

0 


Mean. 
Variance . 


*(x) Cumulative distribution function of a 

standardized normally-distributed random 
variable. 
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f(x) Probability density function of X. 

S Total number of observations <_ z 0 . 


(k,m) 


Incomplete Beta function with parameters 
k and m. 



SECTION I - EXAMPLE OP ERROR 


la many cases, reaction times have a log normal 
distribution^ with parameters v and o 2 ; i.e., their 
logarithms are normally distributed with mean n and variance 
o 2 . If an experimenter observes a sample of reaction times, 

R, and estimates probabilities of R exceeding given values, 
he incorporates serious errors into his estimates by assuming 
that R is normally distributed. The magnitude of the error 
can be best illustrated by the following example. 

Table I shows 150 observations of a random variable, R, 
having the log normal distribution, arranged in ascending order 
A number, t, is desired such that the probability of R 
exceeding t is small, for instance, 1-B, where 6 is a number 
close to 1. 

If R is normally distributed and 6 equals .9986, t would 
be estimated by the familiar expression: 

*e«t - R * S R Cl3 

where R and are the sample mean and standard derivations 
of the data. However, R is not normally distributed, and 
estimation of t by equation [1] is erroneous. If R is 
incorrectly assumed to be normally distributed, one would 
obtain 

‘incorrect * -* 3 * * 3( - J1 S> * 



TABLE I - VALUES OP R ARRANGED IN ASCENDING ORDER 


.1420 

.2572 

.3356 

.4214 

.5997 

.1423 

.2578 

.3398 

.4276 

.6062 

.1459 

.2585 

.3433 

.4301 

.6079 

.1477 

.2649 

.3566 

.4411 

.6237 

.1503 

.2658 

.3570 

.4447 

. 6361 

.1546 

.2730 

.3600 

.4477 

.6398 

.1558 

.2771 

.3604 

.4620 

.6442 

.1943 

.2779 

• 3613 

.4655 

.6475 

.1982 

.2805 

.3621 

.4678 

.6479 

.2010 

.2855 

.3634 

.4698 

.6530 

.2056 

.2885 

.3635 

. 480? 

.6601 

.2100 

.2921 

.3704 

.4828 

.6666 

.2127 

.2921 

.3708 

.4835 

.6681 

.2175 

.2927 

.3810 

.4866 

.6706 

.2183 

.2935 

.3812 

.4936 

.6780 

.2218 

* .2936 

.3824 

.4971 

.6839 

.2321 

.2981 

.3827 

• 4993 

.6945 

.2360 

.3006 

.3832 

.5055 

.8602 

.2373 

.3028 

.3912 

.5076 

.8624 

.2378 

.3028 

.3919 

.5233 

.8747 

.2378 

.3052 

.3934 

.5379 

.8825 

.2398 

.3108 

.4024 

.5455 

.8879 

.2*114 

.3139 

• .4066 

.5460 

.9177 

.2421 

.3139 

.4085 

.5470 

.9263 

.2429 

• 3149 

.4091 

.5564 

.9456 

.2449 

.3172 

.4115 

• 5721 

.9632 

.2456 

.3173 

.4128 

.5803 

1.0351 

.2504 

.3268 

.4146 

.5837 

1.1202 

.2508 

.3333 

.4158 

.5854 

1.1390 

.2512 

1 12i!I 

-4193 


1.1928 


& - .435 
S R - .219 


The magnitude of the error can be shown in two ways: 

First, consider the true probability (not .9986) of R 

exceeding t. 

6 incorrect 
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Since log R~ N(y, o 2 ), it follows that 


Pr 


{R * t} ■ # ( 1 °S. t =- JL ) 


where t (•) is the standardized normal distribution function. 

The 150 observations in Table I came from a ’log normal 
distribution with u « -1 and 0 » Therefore, 


p_ / R < «. \ . * / log ^incorrect 

Pr (R t incorrect J * ' l/£ 


- (- 1 ) 


- * (2.176) - .9852, 

as compared with .9986. The probability, .9852, is 
corroborated by the data. Note that, of the 150 observations, 
3 exceed the actual probability of R exceeding 

fc incorrect werc 1 “ *9986 * .001^, it is extremely unlikely 

that this event would occur as many as 3 times out of 15C 
trials. 

1 

* 

Another way of determining the magnitude of error is to 
compute the number t^ such that Pr{R < t^ J actually 

is equal to .9986. Thus, t^„„_ must satisfy 

vi Uc 


log t t - (-1). 

♦ ( — > - .9986. 


£23 


.1/2 


Solving [2] yields 0 * e * 1*6^87 » .ft number considerably 
higher than t lneorMct . 


~y ■ - < 


SECTION II - THEOREMS 


Suppose X is an observable random variable. Prom the 
failure analysis viewpoint , it might be desirable to estimate 
percentage points and tolerance limits for X. A percentage 
point, Cp, is a number such that the probability of X 
exceeding is equal to 1-p. Tolerance limits for X define 
an interval [x(i), x(j)]. This interval is such that, at 
least 100 8 percent of the time, the probability is 1-a that 
x(l) < X < x(J), where l-« is the chosen level of confidence, 
and 8 is any arbitrary positive number less than 1. 

Let xi, x 2 , . .., x R be a sample of n independent 

observations of Xj and suppose F(z), the cumulative 
distribution function of X, is continuous and strictly 
increasing over the range of Interest. If x(l), x(2). ..., 
x(n) denotes the observed sample arranged in ascending order 
(that is, x(i) < i(J) for 1 < J), then the four following 
theorems hold: 

THEOREM 1: If z is any real number, then Prjx(i) £ zj ■ 

£(;)cF<*n r ci-F(s)] n - r 

r-i 

Proof : 

For a given observation of X, the event (X £ z) has 
probability F(z). Let S equal the total number of >oservation» 



of X less than n equal to z. Then, X has the binomial 
distribution with parameter F(z). Thus, 

Pr {S > i) - [P(z)] r [l~P(z)3 n " r . 

But, S >_ i means that there are at least i observations 

less than n equal to z. This is equivalent to starting x(i)^ z 

THEOREM 2: If i and J are chosen before observing the data 

such that 1 < i < J < n, then [x(i), x(j)3 is a 
confidence interval, independent of F, for Ip, 
the 100 x p percentage point of the distribution 
of X. Specifically, the level of confidence 
equals 

pr u<i> iS[ i xu» - £ (;) P r u-p) n - r - £(?) P r (i-P) 
Proof: 

Since P is continuous and strictly increasing in the 
range of interest, |p is uniquely defined for a given p 
in that range . 

Pr { x ( i ) « C p ) = Pi? Cx(i) < | p , x(J) < | p ) 

+ Pr {x(i> < | p , x(j) » C p > 

- Pr (x(j) < ?p) + P r {x(i) < |p, x(4) > |p) 


since x ( i ) <_ x(j). Therefore,', 

» 

Pr { x ( i ) < . Cp) - Pr (x(j) < C p ) « Pr (x(i) £ t p l x(J)}. 

Since P is continuous, ’ • 

Pr (x(i) £ c p } - Pr { x ( J ) < 5 p J - Pr (x(i) < ( p < x(J)}. 

Hence, from THEOREM 1, it follows that 

Pr <x(i) < c p < x(j)} QcP(Cp) r Cl-P(Cp) J 

r«i 

n 

'£(") cp< v r] [i - Fu P> jr " r 

r=i 

n n 

•E(r) (1 -^ n '- r 2(?)p r <l-P) n - r 

r«i 

since Cp is defined so that F(Cp) * P* 

THEOREM 3 : ^ ^ Let f(x) be the probability density function of X 

% 

and let the random variable be the area under 

f(x) between x(i) and x(J) (i < J). Then 

equals the probability that X lies between x(i) 

% 

and x(j) and the density function of Is given by 

h ( V * (j-l-DMn-Mii v lj' l ~ 1 ' 1 ( 1 - v lj) n ~ Jtl 

T HEOREM k : The probability that 100 0 percent, or more, of 

X will be in the tolerance interval Cx(i), x(j)3 
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(* 


that is, Pr {V 


ij > 8} )» 


is given by 


-E 0 ' 


i - >; v;)& r (i-s) n ” r 
r-j-i k 


Proof: 


Let V 


* I F(z) dz. Then, ft? {Vj, > P) “ J h(v) dv 

x(i) *P 


But, by THEOREM 3, 


w/„ \ _ n I „ j — i •* 1 /, »H"J+i 

h(v i y ~ (j-i-l)l (n-j+i)! Uj ( 1 “ v ij' 


Hence , 


Pr IV^ > p} 


n! 

<J— i-1) ! (n-j+i) 


r /. 


J- i-1 
'ij 


< 1 “ v ij ) 


n-J +i 


= 1 - Ip C(j-i), (n-j+i+l)3. 


where I g (k,m) is the Incomplete Beta function. The quantity 
Ig C(j-i) , (n-j+i+1)] can be obtained from the binomial 


distribution by the following relationship; 


.m 


I e t(j-i), (n-j+i+l)] 


r*j-i 
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SECTION III - APPLICATION 


For the lunar excursion module to land safely, it is 

necessary that certain end conditions not be excessive. One 

of these end conditions is the vertical component of velocity, 
• • 

Z. Table II gives values of Z obtained from 122 independent 
lunar landing simulations. Statistical tests reject the 
hypothesis that these values came from a normal or any other, 
well known distribution. (See Ref. 4; Kolmogorov-Smirnov 
Goodness of Fit Test . ) Therefore , in order to estimate 
percentage points and tolerance limits of this unknown 
distribution, it is necessary to use a distribution-free 
(non parametric) procedure. It is clear that the range of Z 
is an interval on the real line; hence, the conditions of 
SECTION II are satisfied. 

t 

A. ESTIMATION OF £ p 

Suppose it is desired to estimate £ . By Theorem II, 

• 95 

* • 

any interval of the form (Z(i), Z(J)] is a confidence 

for £ . However i and J should be chosen so that a 

• 95 

reasonable confidence level is attained; that is, it is 
advantageous to have 

Pr (Z(i) < £ < Z(j)} = l-o 

“ *95 ” 








where a is a small probability. In other words, a 

is the probability that the true value of 5 lies 

• 95 

outside the interval of estimation. For example, if i 


is chosen to be 111, 
x(J) « 16.26, and it 

Pr{8.88 < t < 

- . 95 - 


and J to be 120, 
follows that 


122 



then x(i) * 8.88, 

( .95) r (.05) 122-r 
<.95) r (.05) n " r 


■ .9805-. 053^ - .9271 


Since it is of no concern in this particular problem 

* 

if the true value of t is less than Z(i), the interval 

• 95 

in equation [2] may be changed to a one-sided form, 

[- ", Z(J)]. In this case, equation [2] reduces to 

n 

Pr U p £ X(J)> * 1 

r-j 


0 


P r (1-P) 


n-r 


l-o 


Returning to the given example, it follows that 


15 


Pr U < 16.26} « 1 - .0534 » .9466. 

• 9 5 

B. MAXIMUM CONFIDENCE LEVEL 

Note that as j increases, a decreases until, the 
maximum confidence level of l-p n is attained if J ■ n. 

For this reason, when p is very close to 1 and n is not 
very large, any attempt to estimate results in a very 

low confidence level. 

A rough estimate of a desirable n for a given p 
may be obtained using the relation that, for n > 100, 

l-p n * l-e“ n ^ -p ^. If it is stipulated that the maximum 
confidence level should be 1-a, then n must be determined 

such that l-e“ n ^ 1 “ p ^ # 1-a. In other words, let n * 

# 

(- log a) . 

(l~p) 

EXAMPLE : 

It is desired to find a sample size that could be 
used for estimating C 9999 with a maximum confidence of 



99. 


SOLUTION : 


Let n be approximately equal to 


r 3-DS (-vPIL. s i|6050 

.0001 *»oupu. 


C. TOLERANCE LIMITS 

Suppose it is necessary to determine the following 
sets of tolerance limits for the data given in Table II. 

1. Determine i and j such that: 

a. The probability is .90 (that is, l-o ■ .90), that 

b. At least 8 5$ of the time, Z lies between x(i) and 
x(J) ($ = .85). 


Determine i and J such that : 

a. The probability (l-o) * .93, that at least 

b. 90$ ($ « .90) of the time Z lies between x(i) 
and x(j). 

2.. Determine j such that: 

a. The probability is .94, that at least 

♦ 

b, 85 $ of the time Z will be leas than x(j). 







% 
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4.. Determine i and j such that: 

a. The probability is .999, that at least 
* 

b . 99.865? of the time Z will be between x(i) and 
x(J) . 

Although these tolerance limits can be obtained 
by a direct application of Theorem a computer program 
has been written providing the necessary information in 
tabular form. The output of this program Is presented 
in Table III. (The computer program that generates .Table 
III is available from the Computation and Analysis 
Division. ) 

To construct the set of tolerance limits in example 

C. 1_. , read down the .85 Beta column to 1-a ■ .90 (or 

the number closest to .90). Then read the corresponding 

entry in the J-I columns, which is 109, indicating that 

the x(i) and x(J) used for the tolerance limits are 

such that j-i » 109. Hence, any of the following sets 

* 

of x(i) and x(j) could be used to satisfy the desired 
tolerance limits.- C.l. : [x(l), x(110]i [x(2)» x(lll)]; 

[x(3), x(112)], etc. 

Suppose the experimenter desired to use x(5) and 

x(ll6) he could assume that the probability is .891$ 

* 

that at least 85 ? of the time 2 would lie between 1.20 
and 9.48. 
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TABLE III 


TOLERANCE LIMITS 


j 





0 

I 

rH 


H 

hi 

X 

M 

ij 

U5 

O 


§ 

o 


N ■ 122 
J-I 

.85000 

BETA 

.90000 

.95000 

.97500 

.99865 

122 

1.00000000 

•99999738 

.99808452 

.95444217 

.15711072 

121 

.99999995 

.99996193 

.98578505 

.81192790 


120 

.99999939 

.99972358 

.94662098 

.59084808 


119 

.99999541 

.99866425 

.86417030 

.36409954 


118 

.99997454 

.99516255 

.73506989 

.19113110 


117 

.99988764 

.98598032 

.57471360 



116 

.99958859 

.96608552 

.41013740 



115 

.99871404 

.92945379 

.26659726 



114 

.99649551 

.87094480 

.15799781 



113 

.99153645 

.78859880 




112 

.98164752 

.68520883 




111 

.96387916 

.56824239 




110 

.93487493 

.44802688 




109 

.89156544 

.33500376 




108 

.83206039 

.23722979 




107 

.75645397 

.15901061 




106 

.66722728 





105 

.56904704 





104 

.46797915 





103 

.37035320 





i 102 

.28162844 



1 

: 


I 101 

.20557864 





100 

.14396611 









In example C. 2. , the set of tolerance limits is read 

from the table to be x(i) and x(j) such that J-i ■ 115. 

In-C. » a one-sided case, the x(J) chosen is such that 

j ■ 110. This means that the probability is .93 that 

* 

at least 85 # of the time Z will be less than 8 . 82 . Note 
that the last set of tolerance limits (example C. 4.) 
does not exist for this set of data. That is, there is 
no i and J such that the probability is .999 that at 
least 99 . 865 # of the time Z will be between x(i) and 
x(j). 

SAMPLE SIZE 


To find a set of tolerance limits as described in 
example C. 4 . , a sample size of approximately 8845 
observations would be necessary. The following equation 
provides an approximation to the number of observations 
required for a given 6 and a given confidence level. ^ 


1 + 8 
1-8 


) + i 


Where : 

A is the (l-«) percentage point of the X 1 2 distribu- 
tion with four degrees of freedom. 8 is the probability 
that Z will lie between x(l) and x(J). 1-a is 
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the desired confidence level. (In example C. t 
A = 18.5 e - .99865, and 1-a = .999). 

E. POISSON APPROXIMATION TO THE BINOMIAL SUM 
For large n and 0 close to 1, the sum 

t ; » r a-e)"- r 

r=j -i r 


can be approximated by 


n-O-,, 


r=0 



where X *= n(l-g) 


F. TESTING FOR NORMALITY 

One method of testing the data for normality is to 
use the Komogorov-Smirnov test. This test is available 
in a computer program from the Computation and Analysis 
Division* 6 *. 
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